indicating missing numerical data is to leave it blank. Most software treats blank cells as missing
data in a calculation, but this changes depending on the software, so it’s important to confirm
missing values handling in your analysis.
Entering date and time data
Now we’re going to tell you something that sounds like we’re contradicting the advice we just gave
you (but, of course, we’re not!). Most statistical software (including Microsoft Excel) can represent
dates and times as a single variable (an “instant” on a continuous timeline), so take advantage of that if
you can. In Excel, you can enter the date and time as one variable (for example, 07/15/2020 08:23),
not as a separate date variable and a time variable. This method is especially useful when dealing with
events that take place over a short time interval (like events occurring during a surgical procedure). It
is important to collect all potential start and end dates so any duration during the study can be
calculated.
Some programs may store a date and time as a Julian Date, whose zero occurred at noon, Greenwich
Mean Time, on Jan. 1, 4713 BC. (Nothing happened on that date; it’s purely a numerical convenience.)
What if you don’t know the day of the month? This happens a lot with medical history items; a
participant may say, “I got the flu in September 2021.” Most software (including Excel) insists
that a date variable be a complete date, and won’t accept just a month and a year. In this case, a
business rule is created to set the day (as either the 1st, 15th, or last day of the month). Similarly,
if both the month and day are missing, you can set up a business rule to estimate both.
If you impute a date, just create a new column with the imputed date, because you want to be cautious.
Make sure to keep the original partial date for traceability. Any date imputation should be consistent
with the study protocol, and not bias the results. Completely missing dates should be left blank, as
statistical software treats blank cells as missing data.
Because of the way most statistics programs store dates and times, they can easily calculate
intervals between any two points in time by simple subtraction. It is best practices to store raw
dates and times, and let the computer calculate the intervals later (rather than calculate them
yourself). For example, if you create variables for date of birth (DOB) and a visit date (VisDt) in
Excel, you can calculate an accurate age at the time of the visit with this formula:
Checking Your Entered Data for Errors
After you’ve entered all your data into the computer, there are a few things you can do to check
for errors: